On using Page Cooccurrences for Computing Clickstream Similarity
نویسندگان
چکیده
Clickstream analysis provides valuable insight into the behavior of users and can be translated into better business opportunities and increased user satisfaction. A fundamental problem in clickstream analysis is the computation of the distance (or the similarity) between two clickstreams. While, there exists a considerable amount of literature which propose methods of computing path similarities, they rely on the edit distance or the related longest common subsequence to align the two clickstreams. The edit distance provides a least cost sequence of transformations that result in the two clickstreams to be identical. Often, measures of path similarity are defined on these “aligned” clickstreams. However, the replacement cost used in the “alignment” process used by the edit distance is assumed to be fixed and ignores the degree of similarity of the two page views. Proposed in this paper is a method for computing the replacement cost that is based on the assumption that the degree of similarity between two page views is proportional to their relative frequency of cooccurrence. We define a method, which includes the order of the sequence as well as the time spent on each page, for obtaining the replacement cost of two arbitrary web pages. Though less accurate than content based analysis, our experiments with data generated from a simulator as well as data from an actual web site show that our assumption is well founded and that the proposed method provides a fast and accurate method of computing the similarity between two page views.
منابع مشابه
Minimal Knowledge Anonymous User Profiling for Personalized Services
The paper presents a solution to the problem of application of user profiles for anonymous internet users. The basic assumption is that only minimal knowledge about the user is given, i.e. information such as user session, user tracing and clickstream analysis is not available. This situation is of great interest because it characterizes most internet users, such as user of search engine. In th...
متن کاملHybrid Swarm Intelligence- Based Biclustering Approach for Recommendation of Web Pages
This chapter focuses on recommender systems based on the coherent user’s browsing patterns. Biclustering approach is used to discover the aggregate usage profiles from the preprocessed Web data. A combination of Discrete Artificial Bees Colony Optimization and Simulated Annealing technique is used for optimizing the aggregate usage profiles from the preprocessed clickstream data. Web page recom...
متن کاملDoes banner advertising affect browsing for brands ? clickstream choice model says yes , for some
This paper investigates how exposure to Internet display advertising affects the subsequent choices users make of brand-specific pages to view within a website. Using individual-level clickstream data from a third-party automotive website, we tracked the web pages selected by users as they browsed the site and their exposures to premium placement display ads for different vehicle makes (e.g., F...
متن کاملYou Are How You Click: Clickstream Analysis for Sybil Detection
Fake identities and Sybil accounts are pervasive in today’s online communities. They are responsible for a growing number of threats, including fake product reviews, malware and spam on social networks, and astroturf political campaigns. Unfortunately, studies show that existing tools such as CAPTCHAs and graph-based Sybil detectors have not proven to be effective defenses. In this paper, we de...
متن کاملDiscovery of Significant Usage Patterns from Clusters of Clickstream Data
Discovery of usage patterns from Web data is one of the primary purposes for Web Usage Mining. In this paper, a variation of “user preferred navigational trail” called Significant Usage Pattern (SUP) is proposed. SUPs are patterns that are extracted from clustered abstracted clickstream data, with a higher normalized probability of occurrence and may begin/end with specific Web page(s). The nov...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003